Gaussian Belief with dynamic data and in dynamic network 
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In this paper we analyse Belief Propagation over a Gaussian model in a dynamic environment. 
Recently, this has been proposed as a method to average local measurement values by a distributed 
protocol ( "Consensus Propagation" , Moallemi & Van Roy, 2006) , where the average is available for 
read-out at every single node. In the case that the underlying network is constant but the values to 
be averaged fluctuate ("dynamic data"), convergence and accuracy are determined by the spectral 
properties of an associated Ruelle-Perron-Frobenius operator. For Gaussian models on Erdos-Renyi 
graphs, numerical computation points to a spectral gap remaining in the large-size limit, implying 
exceptionally good scalability. In a model where the underlying network also fluctuates ( "dynamic 
network"), averaging is more effective than in the dynamic data case. Altogether, this implies very 
good performance of these methods in very large systems, and opens a new field of statistical physics 
of large (and dynamic) information systems. 
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Message-passing algorithms have over the last two 
decades turned out to be an important paradigm in 
fields as distant as iterative decoding, image process- 
ing and AI, see [l| for the intuition behind Belief 
Propagation (BP) in AI, and 0, 0, 0] for more recent 
reviews. It has been realized that systems where the 
message-passing algorithms are effective can often be 
assimilated to disordered systems in statistical physics, 
and that the message-passing algorithms themselves 
are closely related to the Bcthe approximation [f|. 
Most applications pursued concern inference in static 
models; how to do this effectively (if approximately), 
and when these methods work. In another direction, 
Consensus Propagation (CP) has been proposed as a 
message-passing scheme to average measurement values 
in a network of connected nodes [6| . This is a naturally 
dynamic setting, where, in large networks, and in many 
scenarios of interest, one must allow the measurement 
values, and perchance the network itself, to change on 
the same time scale as the averaging process. The two 
strands of inquiry are connected by the fact that CP is 
equivalent to BP on a class of Gauss-Markov random 
fields 

In this contribution we study CP both in a static 
network with changing measurement values (dynamic 
data), and in a network where the strengths of the 
interconnections also change (dynamic network). We 
will show that the method has very good scalability, i.e. 
that its performance degrades very slowly as the systems 
grow. In a sense, to be made precise below, performance 
does not degrade with size at all. This should make 
CP a very interesting method for aggregation tasks 
in large and dynamic networks, possibly competitive 
to alternative schemes such as gossiping Q. From a 



physics perspective the salient points are the following: 
(i) CP with dynamic data is (after a transient) a linear 
averaging process; (ii) the kernel of this averaging 
process, being the linearization of Gaussian BP, is 
related to the second variation of the Bethe free energy 
of the Gauss-Markov random field; (iii) the leading 
eigenvalue of the kernel is a self-averaging quantity 
in Erdos-Renyi networks, which in addition does not 
depend on the network size; (iv) CP with dynamic net- 
work and dynamic data functions as well (or better) as 
CB with dynamic data only. Points (ii) and (iii) imply 
that we identify a new random matrix construction 
with unexpected properties, and possibly important 
practical consequences. Point (iv) means concretely that 
dynamic data is the slow stable (and also flat) manifold 
of the kernel, while dynamic network spans the fast 
stable manifold. Perturbations in the dynamic network 
directions hence relax faster than dynamic data, which 
explains the good properties. 

Belief Propagation (BP) and Consensus Propagation 
( CP): BP is an algorithm to infer marginal probability 
distributions of a joint probability functions [J|. It works 
via distributed message passing from each node of the 
underlying graph of the model to every neighboring node 
(FIG. Q]). It is correct on tree-like graph topologies and 
has been shown to often yield good results in topologies 
including loops [ll[ . The messages in BP can be seen 
as I-node marginal conditional probability distributions, 
which implies that BP works best computationally when 
the size of local state space is limited, e.g. for Ising 
spins. BP on Gauss-Markov random field is a special 
case, since Gaussianity is preserved under convolution, 
and the BP messages can be parametrized by two 
real values corresponding to (conditional) mean and 
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FIG. 1: Illustrated Belief Propagation message passing 
scheme in a 6-node network. In the Consensus Propagation 
case the messages rriij are decomposed into messages JCy and 
[Hj ■ 



(conditional) variance. A further very special property 
of BP on Gaussian models is that it is exact for the 
modes of the marginal distributions, in a very wide class 
of models [1,0. 

A special instance of Gaussian Belief Propagation 
is Consensus Propagation This algorithm aims to 
solve the problem of calculating the average y of some 
values yi (gathered by nodes i in a network G) in a 
distributed way. The Gaussian model associated to CP 
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In (TIJ, Z is a normalization, (3 is a global and Qy are 
local coupling parameters. BP on (p} is guaranteed to 
converge for any finite [3, and the modes of any one-node 
marginals computed by BP converge to the average y 
as (3 tends to infinity (as follows from 0). In this way 
estimates of y can be obtained, where a trade-off must 
be made between convergence time and accuracy. 

The algorithm - The following message update rules 
define Consensus Propagation: 
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This paramctrization of BP yields two-dimensional real- 
valued messages consisting of a topology message K and 
a local state update (i. The notation Xj9 means that 
message X is sent from originating node i to target node 
j at iteration step t: N(i) is the set of all neighbors of 
node i, and N(i)\j is the set of all neighbours of i except 
j. The algorithm is said to have attained consensus, if 
the messages are fixed points of @ and (J3J). A belief for 
the average y at time t and node i is obtained via the CP 




FIG. 2: Convergence of the y belief at one node in a random 
network with 500 nodes. The solid line is the CP perfor- 
mance, the dashed line indicates the correct y-average. Insert 
shows behaviour in 1-10 rounds; main figure shows behaviour 
in 10 3 — 10 4 rounds. Node values were generated randomly, 
and then scaled by 90% in round 5 * 10 3 . At iteration step 
n — all CP-messages were initialized uniformly to zero. Fast 
convergence followed by an overshoot (damped oscillation) is 
observed. After the reset at n = 5 * 10 3 the CP-messages 
were left unchanged, and a markedly slower convergence, but 
without an overshoot, is observed. 
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The consensus beliefs ([4|, with K and \x at a fixed point 
of (J2j) and (J3j) , are the Belief Propagation predictions of 
the modes of the one-node marginals of the probability 
distribution ((T|). 

Convergence for different initializations - Figure [2] 
shows performance of Consensus Propagation after ini- 
tializing all messages to zero: the algorithm shows first 
an oscillating behaviour with fast convergence to a good 
approximation of the correct mean y. After changing 
every node value and NOT re-initializing the messages, 
the algorithm exhibits a steady, yet much slower, conver- 
gence. This second behaviour corresponds to the case of 
dynamic data, where the topology message (iC-message) 
and local state update (/^-message) start at their con- 
verged values before the perturbation. Once a fixed point 
K* is reached, the topology messages will not change if 
only the local measurement values j/j change, since {[2j) is 
an equation only involving topology messages. Except for 
an initial transient, the dynamic data case can hence be 
completely understood by the linear operator expressed 
by the right-hand side of ([3]) (sec below). Generally, it 
seems that the topology messages converge much faster 
than the local state messages, and that therefore the lin- 
ear theory (explained below) also bounds the behaviour 
of dynamic data, where both local values yi and local 
couplings Qij change. Before we turn to the linear anal- 
ysis, let us however point out the observation that differ- 
ent initializations of the messages yield different perfor- 
mance, and, perhaps surprisingly, that initializing with 
K = {.i = seems to be the superior choice. The ob- 
servations of Fig. [3] contradict a conjecture put forward 
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FIG. 3: Convergence behaviour of Consensus Propagation on 
a random graph for different initial messages. As model we 
used a random graph with 80 nodes, all edges present with 
probability 0.1, /3 — 100 and Qij chosen i.i.d. random vari- 
ables uniform between 0.5 and 2. The plot shows the time 
evolution of the deviation of two messages from their con- 
verged values (K* — K and /t* — /i), sent from node 15 to 
node 10 during 500 iterations. The messages were initialized 
proportional to their fixed point values, and start at the top 
right corner of each trajectories in the figure. The trajec- 
tories exhibit an initial fast decay of the error in topology 
messages (abscissa) followed by a slower decay of the local 
state message (ordinate). The only exception is when the 
messages are initialized as K — [i = in which case the 
trajectory seems to fall into the (more) stable manifold of 
the fixed point (a "direct hit"), with the second slow process 
along the ordinate absent. For a graphical illustration of the 
conjectured behaviour, see [4] The fixed point in this example 
have K* 510 — 62.61 and fi* B 10 = 3.95. 



FIG. 4: Convergence scheme for Consensus Propagation. The 
jf-subspace is a fast, the /x-subspace a slow stable manifold. 



Theory of Consensus Propagation - Consensus Prop- 
agation can be considered as non-linear dynamical sys- 
tem in a multidimensional space spanned by all K- and 
fi- messages: 
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in Q that convergence times for = and = K* 
are equivalent. In fact, initializing with K = improves 
convergence dramatically. Let us note that if K be re- 
initialized to zero, then the re-initialization of /i is ar- 
bitrary, since by ([3]), /i' 1 - 1 will then be equal to yi, i.e 
independent of /i^ . In a scenario where many measure- 
ment values (and/or also the underlying network) change 
simultaneously, re-starting Consensus Propagation using 
K = may therefore by a valid option. We stress that 
this is not obvious, but follows if the dynamic behaviour 
is as illustrated in Fig. 0J This may not be true in all 
underlying topologies. However, in the case that the un- 
derlying topology is locally tree-like, as is the case for 
the random graphs in Fig. [3j a heuristic explanation for 
the faster convergence of Consensus Propagation, when 
initializing with K = 0, is the following: as was shown 
by Q Consensus Propagation yields the exact node aver- 
age on tree-like graphs with the global coupling constant 
(3 = oo and = 0. Initializing CP on a random 

graph with K = and a large value of (3 will yield nearly 
the same messages, after a finite number of iterations, as 
initializing CP with K = on a computational tree as- 
sociated with the graph (using the construction of Q) at 
[3 = oo. This explains the improved behaviour of starting 
with K = qualitatively, but does not explain it quanti- 
tatively, i.e. the apparent complete absence of the slow 
process in Fig. [31 



The numerical experiments above indicate that the fi- 
mcssagc subspacc spans a slow stable and the iC-mcssage 
subspace a fast stable manifold (see Fig. [4}. We will use 
the eigenvalues of a linearized version of R to verify this. 
Following we refer to the non-linear iterated map 
transfer operator R Ruelle-Peron-Frobenius Oper- 
ator. The linear part of R has the matrix representation: 
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R' is the linearized transfer operator. The matrix repre- 
sentation of this operator can be decomposed into four 
quadratic submatrices: 



R' = 



A C 
B 



(8) 



Submatrix A is the transfer matrix in the dynamic data 
case, when the topology messages have converged, sub- 
matrix B is the linear part of the transfer matrix act- 
ing in the dynamic network on the topology messages 
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FIG. 5: Length of projections of (normlength) eigenvectors on 
the p-subspace for each eigenvalue A of the linearized Ruelle- 
Peron-Frobenius-Operator R' for a G(N = 20, c = 8) Erdos- 
Renyi model. f3 — 100, and Qij randomly generated as in 
FigE] 
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c 






20 


18 


0.99949152 


0.00054415 


30 


14 


0.99924356 


0.00083034 


40 


10 


0.99895415 


0.00119833 


50 


8 


0.99851962 


0.00186674 



TABLE I: Comparison of leading eigenvalues of linearized ma- 
trices A and B in four Erdos-Renyi graphs G(N,p = c/N). 
The much smaller eigenvalues of matrix B imply much faster 
convergence of the topology message s. 

alone, and submatrix C is the linear action of the topol- 
ogy messages on the local state variables. Around the 
fixed point, we can verify that topology messages con- 
verge faster than local state updates, by comparing the 
size of the eigenvalues of R' to the projection of the cor- 
responding eigenvectors on the subspace spanned by the 
/i-messagcs. As shown in Fig. O the (isolated) largest 
eigenvalue lies in the subspace of local state updates. In 
addition, most of the other eigenvectors in the subspace 
of local updates also have eigenvalues larger than all the 
eigenvalues projecting on the topology messages. Table [I] 
compares the leading eigenvalues of submatrices A and 
B for four different Erdos-Renyi graphs, reinforcing the 
observation from Fig. [5l In linear theory, the limiting 
factor on convergence is therefore the dynamics of the 
local state updates. 

Dynamic data case - The case when topology mes- 
sages have converged is also of interest when data to 
be measured keep on changing: in this scenario CP is 
a linear averaging process. Indeed, the local state update 
equation ([3]) is then a linear equation of one free vec- 
tor // = and can be expressed in linear operator 
form: 

= b + An^ (9) 

The operator A in Q (acting on //-messages) is the same 
as the submatrix A of the operator R' in (|HJ) , and its spec- 
tral properties are as described in Fig. [5] (rightmost set of 
eigenvalues, all completely in the subspace of local state 



updates). The vector b is the [i- independent part of (J3)), 
and in particular depends on the set of local measure- 
ment values y. If these change in time, (O is obviously 
a linear averaging process with kernel A. If the y do not 
change, and the /x-messages are initialized in some man- 
ner, we expect from Fig. [5] that convergence will eventu- 
ally be dominated by the largest (isolated) eigenvalue of 
A. Fig. [6] shows that this is indeed the case, for several 
different Erdos-Renyi graphs. In these models, we always 
find an isolated largest eigenvalue (data not shown). 
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FIG. 6: Convergence ratios of the linear averaging process (JSJ 
compared to leading eigenvalue of operator A. and numerical 
calculated leading eigenvalues X m ax in four examples of Erdos- 
Renyi models G(N,p = c/N). q: convergence ration, \ m ax'- 
largest eigenvalue. The solid line represents q — X m ax- Cases: 
1: c = 8, N = 50, 2: c = 10, N = 40, 3: c = 14, N = 30, 4: 
c = 18, N = 20. (3 — 100, and Qij randomly generated as in 
Fig for all cases. 

Scalability of CP in Erdos-Renyi graphs - The above 
discussion leads up to the conclusion that the largest 
eigenvalue of operator A of ([9]) is a quantity of major 
importance to understand the performance of CP in 
dynamic environments - both dynamic data only, and 
also dynamic network. The scaling properties of this 
largest eigenvalue therefore determines how effective 
the CP averaging procedure can be in a large network. 
Following the general principles of random graph theory, 
we should compare random graphs of increasing size N, 
but with the same average node degree c. This means 
that every link is present in the graph with probability 
p = j; (up to corrections decaying with N). Table ILT1 
shows that in a family of Erdos-Renyi graphs with 
asymptotic average node degree c = 8 the largest of 
eigenvalue A seems to converge to a finite value less than 
one. In the experiments, the local couplings Qij are 
generated randomly between 0.5 and 2. The fifth column 
gives (for the smaller instances) the standard deviation 
of the largest eigenvalue computed from 100 experiments 
(independent realizations of the random graphs, and 
independent realizations of the local coupling constants 
Qij). The decay of the standard deviation with N 
indicates that the leading eigenvalue is a self-averaging 
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quantity in this ensemble. Let us note that our results 
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20 


0.4 


7.3 


7.6 


0.6 


0.9984 


0.9985 


40 


0.2 


7.7 


7.8 


0.5 


0.9985 


0.9985 


80 


0.1 


7.8 


7.9 


0.4 


0.9986 


0.9985 


160 


0.05 


8.2 


8.0 


0.3 


0.9987 


0.9985 


5000 


0.0016 








0.99850 




10000 


0.0008 








0.99851 




20000 


0.0004 








0.99850 





TABLE II: Convergence ratios for Erdos-Renyi graphs. Cou- 
pling constant f3 = 100 and Qij were chosen randomly uni- 
form between 0.5 and 2. All instances have a theoretical av- 
erage node degree c = 8. The table shows the outcome of a 
single experiment (c exp , X m ax) and for small graphs of 100 
experiments ([c exp ] 100 , a[c exp ]m , [X m ax]loo)- 

concur with (and extend) a result of Q for regular 
graphs, where the authors showed that convergence time 
is not dependent on the graph size. If this be true, 
the leading eigenvalue in that ensemble must also be a 
self-averaging quantity, independent of graph size. 

Fig. [7| shows the dependence of the leading eigenvalue 
on the node degree c, for a number of graphs with 20 
nodes. The eigenvalue shows an increasing trend, in 
this range fairly well approximated by a logarithmic be- 
haviour. 
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FIG. 7: Dependence of convergence ratio q on average node 
degree c in an Erdos-Renyi graph with 20 nodes. The solid 
line is a data fit: q= 0.001046 • log(c) + 0.9965 

Summary - Statistical physics has contributed very 
significantly in recent years to the understanding of Be- 
lief Propagation approaches to inference, which have very 
important applications to e.g. iterative decoding In 
this contribution, we have looked at a Belief Propagation- 
based algorithm for averaging, with potentially numerous 
applications to network management. We showed that 
this Consensus Propagation algorithm, in a dynamic en- 
vironment, is a dynamical system which can be fruit- 
fully analysed by the tools of statistical physics and non- 



linear dynamics. We showed that CP responds quickly 
to changes in the network topology, and more slowly to 
changing data. This can be understood intuitively as a 
dynamic network improves mixability, which should not 
be a disadvantage when computing an average (or an es- 
timate of an average). In a real world application, CP 
is therefore not limited by a changing network structure 
but by dynamic data. Secondly, and of interest to statis- 
tical physicists, we exhibited an interesting self- averaging 
property of the leading eigenvalue of the transfer matrix 
describing the the dynamic data case. Perhaps surpris- 
ingly, this leading eigenvalue seems to be asymptotically 
independent of network size. 

Acknowledgement - R.P. was partially funded by the 
European "Life Long Learning Program" under project 
number DE-2008-ERA/MOB-KonsZuV01-CP07. E.A. 
acknowledges support from the Swedish Science Council 
through the KTH Linnaeus Centre ACCESS, and from 
the Academy of Finland. 



[1] Pearl, "Probabilistic Reasoning In Intelligent Systems", 
Morgan Kaufmann, 1988 

[2] Cowell, "Advanced Inference in Bayesian Networks", in 
"Learning in Graphical Models", edited by Michael Jor- 
dan, 1998 Kluwer Academic Publishers 

[3] Kschischang, Frey and Loeliger, "Factor Graphs and the 
Sum-Product Algorithm" IEEE Transactions on Infor- 
mation Theory, vol. 47, pp. 498-519, February 2001 

[4] Yedidia, Freeman and Weiss, "Understanding Belief 
Propagation And Its Generalization" in "Exploring Ar- 
tificial Intelligence in the New Millennium", (Science & 
Technology Books, TR2001-022), January 2003 

[5] Mezard and Montanari, "Information, Physics, and Com- 
putation", Oxford University Press, 2009 

[6] Moallemi and Van Roy, " Consensus Propagation" , IEEE 
Transactions on Information Theory, Vol. 52, No. 11, 
November 2006 

[7] Jelasity, Montresor, and Babaoglu "Gossip-based aggre- 
gation in large dynamic networks", ACM Transactions 
on Computer Systems, 23(3):219-252, August 2005. 

[8] Weiss and Freeman "Correctness of Belief Propagation in 
Gaussian graphical models of arbitrary topology" , Neural 
Computation 13:2173-2200 (2001) 

[9] Malioutov, Johnson and Willsky, "Walk-Sums And Belief 
Propagation In Gaussian Graphical Models", Journal of 
Machine Learning Research 7 (2006), pp. 2031-2064 
[10] Weiss and Freeman, " Correctness Of Belief Propagation 
In Gaussian Graphical Models Of Arbitrary Topology" 
Neural Computation, Vol. 13, No. 10, pp. 2173-2200, Oc- 
tober 2001 

[11] Murphy, Weiss and Jordan, "Loopy Belief Propagation 
For Approximate Inference: An Empirical Study", Pro- 
ceedings of the 15th Annual Conference on Uncertainty 
in Artificial Intelligence (UAI-99), Morgan Freeman, pp. 
467-47, 1999 



